Source: https://www.rstudio.com/
In this practical you’ll practice plotting data with the ggplot2 package.
If you don’t have it already, you can access the ggplot2 cheatsheet here https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf. This has a nice overview of all the major functions in ggplot2.
ggplot2. Try to go through each line of code and see how it works!# -----------------------------------------------
# Examples of using ggplot2 on the mpg data
# ------------------------------------------------
library(tidyverse) # Load tidyverse (which contains ggplot2!)
mpg # Look at the mpg data
# Just a blank space without any aesthetic mappings
ggplot(data = mpg)
# Set the overall plotting theme
theme_set(theme_bw()) # theme_bw(), theme_minimal(), theme_classic()
# Now add a mapping where engine displacement (displ) and highway miles per gallon (hwy) are mapped to the x and y aesthetics
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) # Map displ to x-axis and hwy to y-axis
# Add points with geom_point()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point()
# Add points with geom_count()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_count()
# Again, but with some additional arguments
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(col = "red", # Red points
size = 3, # Larger size
alpha = .5, # Transparent points
position = "jitter") + # Jitter the points
scale_x_continuous(limits = c(1, 15)) + # Axis limits
scale_y_continuous(limits = c(0, 50))
# Assign class to the color aesthetic and add labels with labs()
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, col = class)) + # Change color based on class column
geom_point(size = 3, position = 'jitter') +
labs(x = "Engine Displacement in Liters",
y = "Highway miles per gallon",
title = "MPG data",
subtitle = "Cars with higher engine displacement tend to have lower highway mpg",
caption = "Source: mpg data in ggplot2")
# Add a regression line for each class
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3, alpha = .9) +
geom_smooth(method = "lm")
# Add a regression line for all classes
ggplot(data = mpg,
mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(size = 3, alpha = .9) +
geom_smooth(col = "blue", method = "lm")
# Facet by class
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = factor(cyl))) +
geom_point() +
facet_wrap(~ class)
# Another fancier example
ggplot(data = mpg,
mapping = aes(x = cty, y = hwy)) +
geom_count(aes(color = manufacturer)) + # Add count geom (see ?geom_count)
geom_smooth() + # smoothed line without confidence interval
geom_text(data = filter(mpg, cty > 25),
aes(x = cty,y = hwy,
label = rownames(filter(mpg, cty > 25))),
position = position_nudge(y = -1),
check_overlap = TRUE,
size = 5) +
labs(x = "City miles per gallon",
y = "Highway miles per gallon",
title = "City and Highway miles per gallon",
subtitle = "Numbers indicate cars with highway mpg > 25",
caption = "Source: mpg data in ggplot2",
color = "Manufacturer",
size = "Counts")
| Dataset | Package |
|---|---|
ACTG175 |
speff2trial |
diamonds |
ggplot2 |
Davis |
car |
heartdisease |
FFTrees |
tidyverse package.diamonds dataset in the ggplot2 package shows information about 50,000 round cut diamonds. Print the diamonds dataset, it should look like this:diamonds
# A tibble: 53,940 x 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47
8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49
10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39
# ... with 53,930 more rows
carat) and its price (price)alpha argument to geom_point()cut using the facet_wrap function:geom_smooth() (You can also try turning the line into a regression line using the method argument)Look at the theme help menu with ?theme_bw() to see a list of all of the standard ggplot themes. Then, using the theme_set() function, try setting your global theme to different themes. When you do, evaluate your previous plotting code again to see the new themes in action!
The ggthemes package contains many additional themes. If you don’t have the package already, install it. Then, look at the ggthemes() vignette by running the following code:
# Open the ggthemes vignette
vignette("ggthemes", package = "ggthemes")
mpg data using the using the Five Thirty Eight themegeom_density()diamonds data using the following template:data argument to diamondscarat to the x aestheticgeom_density() and set the fill to "tomato1"theme_minimal()ggplot(data = XX,
mapping = aes(x = XX)) +
geom_density(fill = "XX") +
labs(x = "XX",
y = "XX",
title = "XX",
subtitle = "XX",
caption = "XX")
geom_boxplot()geom_boxplot(). Then, create the following boxplot using the following templateggplot(data = XX,
mapping = aes(x = XX, y = log(XX), fill = XX)) +
geom_boxplot() +
labs(y = "XX",
x = "XX",
color = "XX",
title = "XX",
subtitle = "XX") +
scale_fill_brewer(palette = "XX")
geom_violin()geom_violin(). You can also change the color palette in the palette argument to the scale_fill_brewer() function. Look at the help menu with ?scale_fill_brewer() to see all the possibilities. In the plot below, I’m using "Set1"stat_summary() function to add summary statistics as geoms to plots. Using the following template, create the following plot showing the mean prices of diamonds for each level of clarity.ggplot(data = XX,
mapping = aes(x = XX, y = XX)) +
stat_summary(fun.y = "mean",
geom = "bar",
fill = "white",
col = "black") +
labs(y = "XX",
x = "XX",
color = "XX",
title = "XX",
caption = "XX")
mpg dataframecoord_flip(). Using coord_flip(), flip the x and y coordinates of your previous plot so it looks like this:mpg dataset, and save it as an object called myplot<- add a regression line to the myplot object with geom_smooth(). Then evaluate the object to see the updated version. It should now look like this:ggsave(), save the object as a pdf file called myplot.pdf. Set the width to 6 inches, and the height to 4 inches. Open the pdf outside of RStudio to make sure it worked!midwest dataset and look at the help menu to see what values it contains. It should look like this:# A tibble: 437 x 28
PID county state area poptotal popdensity popwhite popblack
<int> <chr> <chr> <dbl> <int> <dbl> <int> <int>
1 561 ADAMS IL 0.052 66090 1270.9615 63917 1702
2 562 ALEXANDER IL 0.014 10626 759.0000 7054 3496
3 563 BOND IL 0.022 14991 681.4091 14477 429
4 564 BOONE IL 0.017 30806 1812.1176 29344 127
5 565 BROWN IL 0.018 5836 324.2222 5264 547
6 566 BUREAU IL 0.050 35688 713.7600 35157 50
7 567 CALHOUN IL 0.017 5322 313.0588 5298 1
8 568 CARROLL IL 0.027 16805 622.4074 16519 111
9 569 CASS IL 0.024 13437 559.8750 13384 16
10 570 CHAMPAIGN IL 0.058 173025 2983.1897 146506 16559
# ... with 427 more rows, and 20 more variables: popamerindian <int>,
# popasian <int>, popother <int>, percwhite <dbl>, percblack <dbl>,
# percamerindan <dbl>, percasian <dbl>, percother <dbl>,
# popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
# poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
# percchildbelowpovert <dbl>, percadultpoverty <dbl>,
# percelderlypoverty <dbl>, inmetro <int>, category <chr>
ggplot(data = XX,
mapping = aes(x = XX, y = XX)) +
geom_point(aes(fill = XX, size = XX), shape = 21, color = "white") +
geom_smooth(aes(x = XX, y = XX)) +
labs(
x = "XX",
y = "XX",
title = "XX",
subtitle = "XX",
caption = "XX") +
scale_color_brewer(palette = "XX") +
scale_size(range = c(XX, XX)) +
guides(size = guide_legend(override.aes = list(col = "black")),
fill = guide_legend(override.aes = list(size = 5)))
ggplot(data = XX,
mapping = aes(XX, fill = XX)) +
geom_density(alpha = XX) +
labs(title = "XX",
subtitle = "XX",
caption = "XX",
x = "XX",
y = "XX",
fill = "XX")
geom_tile()geom_tile() function. Try creating the following heatplot of statistics of NBA players using the following template:# Read in nba data
nba_long <- read.csv("https://raw.githubusercontent.com/therbootcamp/therbootcamp.github.io/master/_sessions/_data/nba_long.csv")
ggplot(XX,
mapping = aes(x = XX, y = XX, fill = XX)) +
geom_tile(colour = "XX") +
scale_fill_gradientn(colors = c("XX", "XX", "XX"))+
labs(x = "XX",
y = "XX",
fill = "XX",
title = "NBA XX performance",
subtitle = "XX",
caption = "XX") +
coord_flip()
psavert) from the economics dataset.ACTG175 dataset. To do this, you’ll need to use both geom_boxplot() and geom_point(). To jitter the points, use the position argument to geom_point(), as well as the position_jitter() function to control how much to jitter the points.midwest_IL <- midwest %>%
filter(state == "XX") %>%
mutate(popdensity_z = (popdensity - mean(popdensity)) / sd(popdensity)) %>%
arrange(desc(popdensity_z)) %>%
mutate(county = factor(county, levels = county)) %>%
slice(1:25)
ggplot(XX, aes(x = XX, y = XX)) +
geom_segment(aes(y = 0,
x = county,
yend = popdensity_z,
xend = county,
col = popdensity_z), size = XX) +
geom_point(size = XX, fill = "white", shape = 21) +
labs(title = "XX",
subtitle = "XX",
Y = "XX",
X = "XX") +
ylim(XX, XX) +
scale_colour_gradient(low = "XX", high = "XX", limits = c(-.1, 9)) +
coord_flip() +
geom_text(aes(label = 1:25)) +
guides(col = FALSE) +
theme_XX() +
theme(panel.grid = element_blank())
mpg_agg, a dataframe containing aggregated data from the mpg dataframe, is below. Once you’ve created mpg_agg, create the following heat plot using geom_tile()# Calculate mean highway miles per gallon for each combination of
# manufacturer and class
mpg_agg <- mpg %>%
group_by(manufacturer, class) %>%
summarise(
hwy_mean = mean(hwy)
)
Many of the plots in this practical were taken from Selva Prabhakaran’s website http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
For making maps with ggplot, check out Eric Anderson’s tutorial at http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html